Modelling Legitimate Translation Variation for Automatic Evaluation of MT Quality
نویسندگان
چکیده
Automatic methods for MT evaluation are often based on the assumption that MT quality is related to some kind of distance between the evaluated text and a professional human translation (e.g., an edit distance or the precision of matched N-grams). However, independently produced human translations are necessarily different, conveying the same content by dissimilar means. Such legitimate translation variation is a serious problem for distance-based evaluation methods, because mismatches do not necessarily mean degradation in MT quality. In this paper we explore the link between legitimate translation variation and statistical measures of a words salience within a given document, such as tf.idf scores. We show that the use of such scores extends the N-gram distance measures in a way that allows us to accurately predict multiple quality parameters of the text, such as translation adequacy and fluency. However legitimate translation variation also reveals fundamental limits on the applicability of distance-based MT evaluation methods and on data-driven architectures for MT.
منابع مشابه
The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملAutomatic and Manual Metrics for Operational Translation Evaluation Workshop Programme
This paper presents a study on human and automatic evaluations of translations in a French-German translation learner corpus. The aim of the paper is to shed light on the differences between MT evaluation scores and approaches to translation evaluation rooted in a closely related discipline, namely translation studies. We illustrate the factors contributing to the human evaluation of translatio...
متن کاملBeyond Linguistic Equivalence. An Empirical Study of Translation Evaluation in a Translation Learner Corpus
The realisation that fully automatic translation in many settings is still far from producing output that is equal or superior to human translation has lead to an intense interest in translation evaluation in the MT community. However, research in this field, by now, has not only largely ignored the tremendous amount of relevant knowledge available in a closely related discipline, namely transl...
متن کاملComparative Evaluation of Automatic Named Entity Recognition from Machine Translation Output
We report the results of an experiment on automatic NE recognition from Machine Translations produced by five different MT systems. NE annotations are compared with the results obtained from two highquality human translations. The experiment shows that for recognition of a large class of NEs (Person Names, Locations, Dates, etc.) MT output is almost as useful as a human translation. For other t...
متن کاملUsing Contextual Information for Machine Translation Evaluation
Automatic evaluation of Machine Translation (MT) is typically approached by measuring similarity between the candidate MT and a human reference translation. An important limitation of existing evaluation systems is that they are unable to distinguish candidate-reference differences that arise due to acceptable linguistic variation from the differences induced by MT errors. In this paper we pres...
متن کامل